This webpage is for showing more results of NeuronMotif. If you use any content of this website, please cite:
Wei, Zheng, et al. “NeuronMotif: Deciphering transcriptional cis-regulatory codes from deep neural networks.” bioRxiv (2021).doi:10.1101/2021.02.10.430606
The code of NeuronMotif is available at github:
https://github.com/wzthu/NeuronMotif
Contact:
Zheng Wei, wei-z14(at)mails.tsinghua.edu.cn
Xiaowo Wang, xwwang(at)tsinghua.edu.cn
Department of Automation, Tsinghua University
Introduction
The goal of NeuronMotif
NeuronMotif is an algorithm that can translate a convolutional neuron in a well-trained Deep Convolutional Neural Network (DCNN) to motif grammar including motif dictionary and motif syntax (Figure I).
Understand DCNN learning genomic from the perspective of linguistics
Similar to learning English, recognizing words and phrases are the most fundamental work to understand the language of DNA sequence. Based on this common sense, we first need to know what words in the DNA sequence are remembered by a convolutional neuron who is an excellent DNA sequence learner. Each convolutional neuron that depends on corresponding convolutional neural network (CNN) substructure in deep convolutional neural network (DCNN) can read all sequences in fixed length as input. The deeper neurons with larger receptive fields can read the longer sequences. The only indicator reflects different response for various word is the activation output value (Figure II). To investigate the difference between the words with various activation level, we can aggregate the sequences with similar activation level. Among all the sequence, the less sequences with higher activation value are more informative than the more sequences with lower activation value (Fig. 2b in paper). Besides, the higher activation levels tend to have a stronger impact on the activation levels of downstream neurons, which may eventually influence the result of genome function annotations at the end of DCNN, such as transcription factors (TFs) and histone markers (HMs) (Fig. 2a in paper). Therefore, the importance and the certainty of a word occurring in the input sequence are determined by output (y). Given the value of y, it easy to know which character (base) in sequences x is as important as a part of the root or affix in a word (Figure II). This can be implemented by activation-level-weighted sampling from a collection of activated fixed-length sequences because we know the relation y=f(x). For each position among the sampled sequences, the unchanged, less changed and random bases are more likely to be root, affix and blank (placeholder) respectively. Such characteristic of word-form can be reflected in the position weight matrix (PPM) obtained by aggregating all sampled sequences based on the probability.
The key difference between using Deep Learning (DL) model to learn natural language and DNA sequence is that human has an English dictionary but DNA does not have a DNA dictionary. DNA is a totally new foreign language for us to learn just like a baby learning language. The base elements in the current natural language processing DL model are the words. The base elements in the genome annotation DL model are letters or bases. So DCNN has to extract both words (motif sequence) for building a dictionary (motif databases) and syntax (motif arrangement) for making sentences (sequence) so that it can make the right evaluation of the grammar checking result.
Main idea of NeuronMotif
Each neuron in DCNN representing one or more motifs (patterns) which is fixed in length due to the fixed-length of receptive field (Figure IIIa). The CNN substructure of the neuron decides the motifs and the function between input and output y=f(x). In a typical CNN structure, a deeper convolutional neuron usually corresponds to a longer motif, which is combined and transformed from the motifs of shallower neurons. In Figure IIIa, a sequence “ACAGGTAT” is fed to the CNN substructure. It activates the neurons that match the subsequence in each layer and finally activates the output neuron. It can activate the neuron because it contains key sub-sequence “CAGGT”. The one without the key sub-sequence may not activate some of those neurons.
The simplest example of shifting latent variable is the max-pooling process of both pooling size and stride being 2 (Figure III b). Here, we consider a convolutional neuron with 2-layer (L=2) CNN sub-structure to recognize 2 sequences that can both match the key consensus sequence pattern “CAGGT” in motif with 1 bp offset. In layer 1, each CNN neuron kernel motif scans the sequences respectively from left to right and generates a corresponding row (channel) of activation signals. Neurons are differently activated with these two sequences. In general, the correlated key neurons (the rest neurons are hidden in Figure III b) are strongly activated with 1bp offset. The uncorrelated ones are silenced or weakly activated randomly. With the max-pooling process, the adjacent signals are merged by selecting the larger one. Unfortunately, the signals of different sequences passed from upstream neuron network with 1bp offset become similar feature maps that will finally result in the similar activation signal in layer 2. Here, the uncorrelated neuron activation signals are too low to affect activation signal of the neuron kernel in layer-2. From this forward propagation process, we know this layer-2 neuron represents 2 motifs. If the sequences are sampled under the condition of position 1 and 2 separately, the motif of layer-2 neuron is matched to the known ZEB1 motif MA0103.3 and MA0103.2, respectively. However, if the sequences are sampled randomly without position condition constraint, the motif of layer-2 neuron is the mixture of 2 motifs (bottom of Figure III b). The incorrect motif mixture has to be decoupled because it is impossible to obtain the position condition (shifting latent variable) without performing sampling separately.
Analysis of existing DCNN interpretation methods
Most existing methods seek to interpret DCNN by detecting the correlation between model output of genome function and each single input of nucleotide base in different approaches.
In DeepSEA paper[1], it used perturbation-based method to interpret the model (Figure IV a). In this method, it changes the nucleotide base in sequence to obtain the affection of the output neuron. If the activation value of output neuron increases or decreases dramatically, then it regards the nucleotide base position as an important position for the prediction target of output neuron. In this way, it can also evaluate the relative importance of the 4 bases at the same position. However, the result depends on the other nucleotides in the sequence. For different sequence, the result of the same nucleotide base position may be totally different.
The other two currently widely-used strategies are activation maximization[2] and back propagation[3-4] (Figure IV b and c), both of which are borrowed from Computer Vision (CV). Adapted activation maximization methods simply aggregate the sequences with approximately max activations to show the importance of each nucleotide base. But it does not work in deeper layers because the key motif sequences are not always aligned in the sequences (Figure IV b). Simply stacking all sequences will generate the mixture of the motifs. Another problem is how to decide the threshold for filtering the sequence. The results are quite different under the different threshold. The back propagation based methods like adapted saliency map methods[4] or DeepLIFT[3] uses the importance score (IS) of each nucleotide base obtained by the backpropagation algorithm or the modified version to represent motif in a known functional sequence. Except for the motif in the input sequence, IS mixes more motifs located at other potential positions (Figure IV c). Hence, these adapted CV methods are not well compatible to genome function study.
The key difference between the DCNN model interpretation of image and genome sequence is that human can recognize the mixture of the objects in the image but cannot distinguish the mixture of the motif sequences. In CV, the interpretation methods are usually applied on the task like object positioning. These tasks do not require pixel resolution. However, for genome sequence, as long as the sequence or motif deviates from one base, we will get meaningless results. Hence, we developed NeuronMotif to decouple the mixture in DCNN.
Motif grammar examples
In these work, we use two dataset from the DeepSEA paper and Basset paper. We train four models:
- DeepSEA (trained by DeepSEA dataset)
- DD-10 (trained by DeepSEA dataset)
- Basset (trained by Basset dataset)
- BD-10 (trained by Basset dataset)
Here, we show some examples of the motifs decoupled from these models. See next section for details.
Neurons in shallow layers
The receptive field is small for the neurons in shallow layers. For example, the neuron receptive field of layer 1 and layer 2 in Basset model is 19 bp and 51 bp. The neuron receptive field of layer 1 and layer2 in DeepSEA model is 8 bp and 39 bp. Usually, one neuron only represents one motif with shift diversity. Decoupling for once is usually enough for most shallow neurons. When NeuronMotif has been applied to each neuron, we used Tomtom to match these motifs to JASPAR motifs. In shallow layers, the computation time is acceptable so we do not use smooth methods and slice the motif from the whole receptive field. The results are displayed separately for each motif in tomtom.html files. When the tomtom.html is not large, we merge them into one file.
Neurons in deep layers
The receptive field is large for the neurons in deep layers. For example, the neuron receptive field of layer 10 in BD-10 model is 140 bp and the neuron receptive field of layer 10 in DD-10 model is 144 bp. One neuron in the deep layer has strong potential to represents more than one motif with shift diversity. Besides the result like shallow layers, we applied decoupling algorithm in NeuronMotif twice for each neuron in layer 10. The total number of shift ID is 256 for these neurons in layer 10 of BD-10 and DD-10. For computational efficiency, we show the 256 motifs (Figure VI) and motif matching result respectively. The motifs are sliced from the whole receptive field for matching to JASPAR database.
Motif grammar gallery
In this work, we have used 2 datasets. They are DeepSEA and Basset. For each dataset, we trained DCNN models and used NeuronMotif to annotate all shallow and deep convolutional neurons.
DeepSEA Dataset
This dataset is obtained from the work:
Zhou, Jian, and Olga G. Troyanskaya. “Predicting effects of noncoding variants with deep learning–based sequence model.” Nature methods 12.10 (2015): 931-934.
Input is \(1000\times4\) onehot code of DNA sequence.
Output is the boolen label of the DNA sequence to mark if it overlaps with 919 types of ChIP-seq (TF and HM) and DNase-seq peaks from different cell types.
DeepSEA model
Model structure is
Convolutional layer \(kernel\_320 \times size\_8\)
Max-pooling layer \(size\_4\)
Convolutional layer \(kernel\_480 \times size\_8\)
Max-pooling layer \(size\_4\)
Convolutional layer \(kernel\_960 \times size\_8\)
Max-pooling layer \(size\_4\)
input_bp = 1000
conv_kernel_size = 8
pool_kernel_size = 4
maxnrom = MaxNorm(max_value=0.9, axis=0)
l1l2 = l1_l2(l1=0, l2=1e-6)
def crelu(x, alpha=0.0, max_value=None, threshold=1e-6):
return relu(x, alpha, max_value, threshold)
batch_size=16
seqInput = Input(shape=(input_bp, 4), name='seqInput')
seq = Conv1D(320, conv_kernel_size,kernel_regularizer=l1l2, kernel_constraint=maxnrom)(seqInput)
seq = Activation(crelu)(seq)
seq = MaxPooling1D(pool_size=pool_kernel_size,strides=pool_kernel_size)(seq)
seq = Dropout(0.2)(seq)
seq = Conv1D(480, conv_kernel_size,kernel_regularizer=l1l2, kernel_constraint=maxnrom)(seq)
seq = Activation(crelu)(seq)
seq = MaxPooling1D(pool_size=pool_kernel_size,strides=pool_kernel_size)(seq)
seq = Dropout(0.2)(seq)
seq = Conv1D(960, conv_kernel_size,kernel_regularizer=l1l2, kernel_constraint=maxnrom)(seq)
seq = Activation(crelu)(seq)
seq = Dropout(0.5)(seq)
seq = Flatten()(seq)
seq = Dense(925,kernel_regularizer=l1l2, kernel_constraint=maxnrom)(seq)
seq = Activation(crelu)(seq)
seq = Dense(919,kernel_regularizer=l1l2, kernel_constraint=maxnrom, activity_regularizer=l1_l2(l1=1e-8,l2=0))(seq)
seq = Activation('sigmoid')(seq)
model = Model(inputs = [seqInput], outputs = [seq])
We applied NeuronMotif on this DCNN. We only run decoupling algorithm in NeuronMotif once.
| 1 |
1 |
320 |
up to 1 |
up to 320 |
link |
| 2 |
1 |
480 |
up to 4 |
up to 1920 |
link |
| 3 |
1 |
960 |
up to 16 |
up to 15360 |
link |
DD-10 model
Model structure is
- Convolutional layer \(kernel\_64 \times size\_7\)
Convolutional layer \(kernel\_80 \times size\_3\)
Max-pooling layer \(size\_2\)
- Convolutional layer \(kernel\_128 \times size\_3\)
Convolutional layer \(kernel\_160 \times size\_3\)
Max-pooling layer \(size\_2\)
- Convolutional layer \(kernel\_256 \times size\_3\)
Convolutional layer \(kernel\_320 \times size\_3\)
Max-pooling layer \(size\_2\)
- Convolutional layer \(kernel\_512 \times size\_3\)
Convolutional layer \(kernel\_640 \times size\_3\)
Max-pooling layer \(size\_2\)
- Convolutional layer \(kernel\_1024 \times size\_3\)
Convolutional layer \(kernel\_1280 \times size\_3\)
input_bp=1000
seqInput = Input(shape=(input_bp, 4), name='seqInput')
seq = Conv1D(128, 7)(seqInput)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(128, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(160, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(160, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(256, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(320, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(512, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(640, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(1024, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(1280, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Flatten()(seq)
seq = Dense(925)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Dropout(0.2)(seq)
seq = Dense(919)(seq)
seq = Activation('sigmoid')(seq)
model = Model(inputs = [seqInput], outputs = [seq])
We applied NeuronMotif on this DCNN. We run decoupling algorithm in NeuronMotif once for each layer. We run decoupling algorithm in NeuronMotif twice for layer 10.
| 1 |
1 |
64 |
up to 1 |
up to 64 |
link |
| 2 |
1 |
80 |
up to 1 |
up to 80 |
link |
| 3 |
1 |
128 |
up to 2 |
up to 256 |
link |
| 4 |
1 |
160 |
up to 2 |
up to 320 |
link |
| 5 |
1 |
256 |
up to 4 |
up to 1024 |
link |
| 6 |
1 |
320 |
up to 4 |
up to 1280 |
link |
| 7 |
1 |
512 |
up to 8 |
up to 4096 |
link |
| 8 |
1 |
640 |
up to 8 |
up to 5120 |
link |
| 9 |
1 |
1024 |
up to 16 |
up to 16384 |
link |
| 10 |
1 |
1280 |
up to 16 |
up to 20480 |
link |
| 10 |
2 |
1280 |
up to 256 |
up to 327680 |
link |
Basset Dataset
This dataset is obtained from the work:
Kelley, David R., Jasper Snoek, and John L. Rinn. “Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.” Genome research 26.7 (2016): 990-999.
Input is \(600\times4\) onehot code of DNA sequence.
Output is the boolen label of the DNA sequence to mark if it overlaps with 164 types of DNase-seq peaks from different cell types.
Basset model
Model structure is
Convolutional layer \(kernel\_300 \times size\_19\)
Max-pooling layer \(size\_3\)
Convolutional layer \(kernel\_200 \times size\_11\)
Max-pooling layer \(size\_4\)
Convolutional layer \(kernel\_200 \times size\_7\)
Max-pooling layer \(size\_4\)
input_bp = 600
seqInput = Input(shape=(input_bp, 4), name='seqInput')
seq = Conv1D(300, 19)(seqInput)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(pool_size=3)(seq)
seq = Conv1D(200, 11)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(pool_size=4)(seq)
seq = Conv1D(200, 7)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(pool_size=4)(seq)
seq = Flatten()(seq)
seq = Dense(1000)(seq)
seq = Activation('relu')(seq)
seq = Dropout(0.3)(seq)
seq = Dense(1000)(seq)
seq = Activation('relu')(seq)
seq = Dropout(0.3)(seq)
seq = Dense(164)(seq)
seq = Activation('sigmoid')(seq)
model = Model(inputs = [seqInput], outputs = [seq])
We applied NeuronMotif on this DCNN. We only run decoupling algorithm in NeuronMotif once.
| 1 |
1 |
300 |
up to 1 |
up to 300 |
link |
| 2 |
1 |
200 |
up to 3 |
up to 600 |
link |
| 3 |
1 |
200 |
up to 12 |
up to 2400 |
link |
BD-10 model
Model structure is
- Convolutional layer \(kernel\_64 \times size\_7\)
Convolutional layer \(kernel\_64 \times size\_3\)
Max-pooling layer \(size\_2\)
- Convolutional layer \(kernel\_128 \times size\_3\)
Convolutional layer \(kernel\_128 \times size\_3\)
Max-pooling layer \(size\_2\)
- Convolutional layer \(kernel\_256 \times size\_3\)
Convolutional layer \(kernel\_256 \times size\_3\)
Max-pooling layer \(size\_2\)
- Convolutional layer \(kernel\_384 \times size\_3\)
Convolutional layer \(kernel\_384 \times size\_3\)
Max-pooling layer \(size\_2\)
- Convolutional layer \(kernel\_512 \times size\_3\)
Convolutional layer \(kernel\_512 \times size\_3\)
input_bp = 600
seqInput = Input(shape=(input_bp, 4), name='seqInput')
seq = Conv1D(64, 7)(seqInput)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(64, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(128, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(128, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(256, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(256, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(384, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(384, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = MaxPooling1D(2)(seq)
seq = Conv1D(512, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Conv1D(512, 3)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Flatten()(seq)
seq = Dense(1024)(seq)
seq = BatchNormalization()(seq)
seq = Activation('relu')(seq)
seq = Dropout(0.2)(seq)
seq = Dense(164)(seq)
seq = Activation('sigmoid')(seq)
model = Model(inputs = [seqInput], outputs = [seq])
We applied NeuronMotif on this DCNN. We run decoupling algorithm in NeuronMotif once for each layer. We run decoupling algorithm in NeuronMotif twice for layer 10.
| 1 |
1 |
64 |
up to 1 |
up to 64 |
link |
| 2 |
1 |
64 |
up to 1 |
up to 64 |
link |
| 3 |
1 |
128 |
up to 2 |
up to 256 |
link |
| 4 |
1 |
128 |
up to 2 |
up to 256 |
link |
| 5 |
1 |
256 |
up to 4 |
up to 1024 |
link |
| 6 |
1 |
256 |
up to 4 |
up to 1024 |
link |
| 7 |
1 |
384 |
up to 8 |
up to 3072 |
link |
| 8 |
1 |
384 |
up to 8 |
up to 3072 |
link |
| 9 |
1 |
512 |
up to 16 |
up to 8192 |
link |
| 10 |
1 |
512 |
up to 16 |
up to 8192 |
link |
| 10 |
2 |
512 |
up to 256 |
up to 131072 |
link |
References
[1] Zhou, Jian, and Olga G. Troyanskaya. “Predicting effects of noncoding variants with deep learning–based sequence model.” Nature methods 12.10 (2015): 931-934.
[2] Kelley, David R., Jasper Snoek, and John L. Rinn. “Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.” Genome research 26.7 (2016): 990-999.
[3] Shrikumar, Avanti, Peyton Greenside, and Anshul Kundaje. “Learning important features through propagating activation differences.” International Conference on Machine Learning. PMLR, 2017.
[4] Simonyan, Karen, Andrea Vedaldi, and Andrew Zisserman. “Deep inside convolutional networks: Visualising image classification models and saliency maps.” arXiv preprint arXiv:1312.6034 (2013).